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Abstract. A detailed analysis of correlation between stock returns at high frequency is 
compared with simple models of random walks. We focus in particular on the dependence 
of correlations on time scales - the so-called Epps effect. This provides a characterization 
of stochastic models of stock price returns which is appropriate at very high frequency. 



1. Introduction 

The study of covariances between stocks is a central problem in finance, both to achieve 
theoretical understanding of market structure [1] and to exploit its relevant applications, 
such as portfolio optimization [2]. With the availability of financial high frequency data, 
it has become possible to estimate correlations on very short time scales, down to the 
frequency of individual transactions. As Epps first observed in 1979, the measured corre- 
lations between stock prices decrease as sampling frequency of time series grows [3] . Since 
then other studies on data coming from different stock markets [3] [5] and foreign exchange 
markets |6j |7J evidenced the persistence of such phenomenon - called Epps effects - across 
different markets. 

Understanding the dependence of financial correlations on time scale has important 
practical consequences for portfolio management. For example, for large portfolios, the 
estimation of risk measures at low frequency (e.g. one day) suffers from instabilities, due 
to the scarcity of data [8] . Estimates of financial correlations - and hence of risk measures 
- at high frequency can rely on much richer and longer time series and can potentially 
detect structural changes more efficiently. Relating the structure of correlations at longer 
time scales to that at shorter time scales, provides means of overcoming the information 
deficiency causing the instability of risk measures. Interestingly, Borghesi et al. [9\ found 
that the structure of correlations in groups of very liquid stocks, is largely invariant across 
time scales ranging from 5 minutes to one day. This suggests that estimates of correlations 
on long time scales from high frequency correlations is in principle feasible. 

Transactions in financial markets play two roles: in principle (i) they impact returns 
causing price movements, but in practice (ii) they also allow prices to be known, fixing the 
market value of a traded security until the next trade takes place. Correspondingly, two 
main contributions to the Epps effect have been considered in the literature so far: the 
first relates the Epps effect to genuine lagged correlations, and it arises from (temporary 
or permanent) impact of individual trades on the price dynamics. The second relates to 
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the fact that price dynamics is not synchronous across stocks (i.e. transactions take place 
at different times, in principle, on different stocks) H 




Both lagged correlations and asynchronous sampling contribute to the Epps effect, but 
the relative weight of these two effects is not always easy to assess (see [12] and |13] ) : the 
first aspect to be considered is the fact that trading is not synchronous so that covariance 
estimation is intrinsically problematic at high frequency [2] . Lo and MacKinley proposed a 
solution to this issue based on a " random censorship" model [15] , which was able to explain 
why simple estimators tend to bias correlations towards zero at high frequency (more recent 
works following this line are [16] [17] |18] and |19j). The second factor contributing to the 
Epps effect is the presence of genuinely lagged correlations (lead-lag effect) [20 |21j [22J, 
which should contain informations about the dynamical structure of the market. 

This paper addresses the issue of disentangling these effects at very high frequency. 
We adopt an approach similar to [T5] . and use a previous tick estimator (see [23] for an 
analysis of interpolation-based estimators) to check the impact of asynchronous trading 
on correlations, without any specific choice for their genuine structure; alternative choices 
to deal with asynchrony are indeed available (namely [21] and |25|). The performance of 
some popular estimators has been investigated in [26 . 

We discuss a minimal model of price dynamics, which describes an infinitely liquid 
market: a transaction in this scenario has the only effect of revealing the asset price at 
a given instant of time, but sampling has no impact on prices. We find that also in 
this oversimplified scenario transactions can strongly affect correlations; in particular the 
Epps effect is always dominated by the asynchronous sampling at very high frequency. 
We show that it is possible to infer the genuine correlation structure of the market if one 
supposes inter-trade times to be exponentially distributed; in particular we can analytically 
disentangle the contribution to the Epps effect due to asynchronous trading to the one due 
to a genuine lag. 

We apply the model to data of NYSE, finding that some features of the time series of 
returns at very high frequency can successfully be reproduced. In particular, assuming a 
process of asynchronous sampling of correlated random walks, we can estimate the underly- 
ing correlation function. The heterogeneity of sampling frequency in the bare data implies 
some predictability of less active stocks from the knowledge of more active ones. But 
once the effect of asynchronous sampling is removed, we find no causal structure in lagged 
cross-correlations. Still, cross-correlations are significantly non-zero over time lags of the 
order of ten seconds, whereas auto-correlations decay on the scale of one or two seconds. 
This provides evidence of an information contagion process across stocks, at ultra-high 
frequency. 

The rest of the paper is organized as follows: we first discuss the origin of the Epps 
effect in simple theoretical models with synchronous (Section [2]) or asynchronous sam- 
pling (Section [3]). Section [4] discusses how to reconstruct the underlying correlations from 
asynchronously sampled data, in theoretical models. Section [5] applies these insights to 




lr The finiteness of the tick-size is also a significant source of Epps effect; its impact has been investigated 
in US] HQ- 
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empirical tick-by-tick data of NYSE. We summarize and discuss our results in Section |6j 
Technical derivations and proofs are relegated to the appendix for the sake of readability. 

2. The origin of Epps effect: simple theoretical models 

Consider a multivariate time series with stationary increments dX\, where t is a con- 
tinuous time parameter and i = 1, . . . ,n. The series will represent in the following the 
infinitesimal increment of the log-price of asset i at time i; let the finite variation of log- 
price after a time At be given b)|j : 

[■At 

J o 

and say that the infinitesimal, lagged correlations are given by: 

cf^dtdt' = (dXidX{,) 
while the spectrum Sj is defined as 

S?? = I dTcl j e iuJT . 



We will be interested in characterizing the dependence of the finite, equal time correlation 
^At = (-^-At-^At) on t ne time scale At. Its behavior can be extracted from the knowledge 
of the series dX\, which can be related to C l £ t as: 

r-At f-At 

(1) C% t = / / dtdt'ji 



J o 

1 f + °° , sf j 



t-t' 



, duj ^ (e~^ At - 1) (e iuiAt - 1) 

While for a purely Brownian motion, the scaling of C l ^ t is linear in At, in the general case 
we will quantify deviations from linearity of C l ^ t by considering the quantity: 

(2) P% = ~f^= , 

y c At c At 

that is the Pearson correlation coefficient, built by normalizing the covariance to the vari- 
ances. We will say that the Epps effect is absent whenever p l ^ t is independent of At, and 
it is present otherwise. 

It is interesting to remark some general features of p^ t : first, the positivity of the 
eigenvalues of the covariance matrix C^ t ensures that \p % ^ t \ < 1- The finiteness of the limit 
At —7- of p % ^ t can then be checked from the continuity of the coefficients: if both auto- and 



2 All the following considerations can be easily generalized to the discrete time case. We choose for 
simplicity to present them in continuous time. Notice that is to be interpreted as a distribution (e.g. it 
may contain terms proportional to S(t)). 
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cross-correlations are infinite at the origin, then pl t is finite; the same holds if auto- and 
cross-correlations are both finite at the origin. The case with infinite auto-correlations and 
finite cross-correlations gives instead p^ t — > 0: whenever the time needed by the system 
to auto-correlate is much smaller than the time needed to cross-correlate, then the equal 
time correlation coefficient goes to 0. In the opposite limit At — > oo, if dr = C l i is 
finite, one can also see that: 

p v = — 

The behavior of p^ t during the transient is also interesting, as it contains non-trivial 
informations about the time needed by the system to correlate the dynamics. The origin 
of the Epps effect is best illustrated by discussing few simple examples. 

2.1. Example (Correlated Brownian motions): Let's consider the case of a bivariate 
process of the kind: 

dX] = ^fcdrft + Vl- cdr]] 
dX 2 t = Vcdrj? + Vl- cdrjl 

where the dr\\ are white noises, so that {dr\\drf t ^) = 

5 l3 5t-t' dtdt' . Then this is the only case in which linearity strictly holds both for variance 
and covariance: 

C\l = cAt 



'At 
'At 

so that: 



C l L = At 



PAt = c , 

independent of At, and there is no Epps effect. 

2.2. Example (Lagged series): Let's now consider the lagged version of the previous 
process: 

dX\ = y/cdrj? +yjl-cdr)l 
dXl = y/cdr)° +T + Vl - cdr]? 

In this case: 

c t-t> = d t-t> 

c \-t' = cdt-t'-r 
and it is easy to see (appendix [Bj that p results in this case: 

<°At = e(l- ^)e(At-r), 
where 6{t) is the step function, so the presence of an Epps effect is evident. 
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2.3. Example (Different widths): We can now consider another bivariate process, whose 
lagged correlations are: 

cf t , = c ( -L e H'-''l/6 

Ji , - J_ p -|t-*'l/6 
c t-t' — o e e ' 

with the conditions £j > £ s and c < 1 ensuring \pZ\ < 1. In this case one has: 



At + g ; (e- A ^-l) 
At + & (e" A */6 - 1) 



Such quantity is a constant only for £ s = while in the general case it is a function which 
grows from p^ 2 = c£ s /£; to an asymptotic value p^£, as represented in the blue lines of 
figure 3j The case £ s — > is also interesting, as the variance becomes linear, while p l ^ t is 



given 



l + |.f e -At/ft _ j 
At 



The above examples show that an Epps effect is present if the covariance of a process 
grows with At at a rate smaller than the variance, or equivalently the infinitesimal, lagged 
cross-correlations cj are not proportional to auto-correlations cf. We will see in section [5] 
that financial time series show at high frequency a correlation structure which is reminiscent 
of the one of these examples; in particular such structure is well fitted by a model where 
the dynamics of correlations is described by a lag parameter r and a width parameter £, 
and where variances grow faster with respect to covariances. In [27J this approach is also 
used to describe the dynamics related with the time evolution of the correlation matrix. 

3. Asynchronous sampling of correlated random walks 

While studying a multivariate time series at very high frequency (say, tick-by-tick fi- 
nancial data), it is unlikely that all transactions happen simultaneously; additionally some 
time bin may contain no data point at all, as no transaction took place. This fact may 
cause problems in the estimation of volatilities and correlations |T2j, especially in the ex- 
treme case in which one tries to evaluate such quantities at time scales of the order of the 
inter-trade time. A possible approach to deal with this issue is the creation of a synchro- 
nous series [15] out of the asynchronous one by means of some prescription, such as linear 
interpolation or previous-tick interpolation |23j . We adopt this latter, simpler estimator to 
study the impact of asynchrony on measured correlations, as it allows an easy analytical 
treatment of such quantities without requiring any assumption on their genuine nature (in 
particular we will focus on models containing lagged and short ranged correlations). 

Consider an underlying synchronous process dX\ defined as in section [2J and n subsets 
of points U % = {t\}k& randomly drawn on the real line. Let the probability of drawing a 
point between t and t + dt for subset U l be given by Aj dt. In this way for each subset U l 
the number of points drawn in an interval [ti,t2] is a Poissonian random variable of mean 
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FIGURE 1 . We plot here a realization of a synchronous process X&t (dashed 
line), and a randomly sampled version of the same realization (full line), 
obtained with a sampling rate A = 0.05. 



K (^2 — ti). The corresponding waiting time distribution is exponential, and is given by 
Pi(t) = Xie~ Xit . Given a set of U l and a realization of the underlying synchronous process 
X\, one can define an asynchronous process: 



where t\ = max-ft^ G U l \t l k < 0} and t^ = max{i ! fc 6 U l \f L k < At}. This time series is 
a piecewise constant function, with discrete jumps at the points At = ti , as shown in 
figure [I]) ; notice that this construction implements the previous tick estimator prescription 
(PTE) to deal with missing data. Covariance can be defined in this case as: 



&i t = E 



where E[-] denotes expectation value with respect to the sampling process. Then one can 
generalize the Epps effect, defining as in the previous case the function p£ t : if such function 
depends on At, (generalized) Epps effect is present, otherwise it is absent. 

We will now show three properties which allow to extract information about the asyn- 
chronous process X l ^ t given the spectrum of the synchronous process X l ^ t . The proof of 
these results is given in appendix [A} 
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PI: Covariance of asynchronous processes. Given an asynchronous time series X l ^ t defined 
using a synchronous time series of spectrum S l J and waiting time distributions p l (t) = 
0(t)Xi e" Xii , for i ^ j it holds: 



(3) 



-OO Q«? 
, >->ljJ 



\%Xj 



-iujAt 



1)( 



1) 



(Aj + — ioj) 

Equivalently, covariance in the asynchronous case can be computed by correcting the 
synchronous spectrum with the substitution: 



(4) 



qtj _ qij 



XiXj 



(Aj + iu>)(Xj - ioj) 



In real space, such substitution is equivalent to the convolution: 



(5) 



"t-t 1 



AjAj 



(Aj + Xj) 



dr c\ 3 _ T e 



Xjit'-r) 



+ 



+oo 



dr c\ 3 _ T e 



-Ai(r-t') 



P2: Variance of asynchronous processes. Consider the asynchronous time series Xm, de- 
fined using a synchronous time series of spectrum S u and a waiting time distribution 
pit) = 9{t)Xe- xt . Then it holds: 



Cm 



+ 





p+oo 


2tt, 


L 


2 


-[ 


A^ 


2irJ_ 



hoo 



j At 



5. 



1) (< 



jAt 



1 + 



e -iuiAt _ g-AAt 1 



Equivalently, to compute the variance in the asynchronous case it is necessary to add to 
the synchronous value a correction, so that one gets: 



(6) 



&At = Cm + ^ ( c At 



e- AAt c 



where c r is the Fourier anti-transform of the damped spectrum i + ^2/\2 ■ 

P3: Case of linear variance. If an asynchronous time series Xm is defined using a syn- 
chronous series of variance Cm linear in At (corresponding to a constant spectrum S^) 
and waiting time distribution pit) = 6(t) X e~ At , then the asynchronous value of its variance 
corresponds to the synchronous one. 



The property PI shows what is the effect of the random sampling on the measured co- 
variance: if A = Aj = Aj substitution Q is a low-pass filter (a Lorentzian) with a cutoff 
scale set by A, which suppresses signal at frequencies bigger than the sampling scale. In 
the case Aj ^ Xj an effect of spurious causalitjj^] is also induced: kernel Q has in general 

Wo employ the term "causality" in a loose sense, using the expression "returns of stock i cause returns 
of stock j" to signify that c£? > cj? T , that is, an asymmetry is measured in the lagged correlation of two 
stocks 
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a complex phase, which generates an asymmetry between Cr and c^ r , as pointed out in 



7,'J 



|15j . The direction of such asymmetry is such that the more frequently sampled series 
appears to influence the less sampled one: this merely reflects the fact that one can use 
the information contained in the more sampled series to successfully forecast the less sam- 
pled one. The property P2 allows to calculate in general the asynchronous value of the 
variance, and in particular for the simple case of a very narrow correlation coefficient c" 
P3 implies that variance doesn't necessarily decrease as Aj gets smaller, while covariance 
always gets suppressed; this is why, generally speaking, asynchronous sampling tends to 
enhance Epps effect. Notice that, while PI directly relates Cr with Cr (and Sj with ), 
P2 just connects C l ^ t with C\ t : the asynchronous value of the auto-correlation function 
has to be indirectly obtained from For r^O, this can be done by observing that: 



(7) 



d 2 
dAt 2 



At 



2cl 



1 



7T 



For r = instead the auto-correlation c% may contain a 5 T , which can be deduced from 
the behavior of C^ t for At — > 0. Specifically, if C^ t ~ At, then the auto-correlation is 
divergent in r = 0, which signals the presence of a term 5 T . Conversely, if C v ^ t ~ At 2 or if 
C^ t vanishes faster than At 2 , then c% is regular in 0. 

These results allow us to generalize the analysis of the examples in section [2] to the 
asynchronous case. 



3.1. Example (Correlated Brownian motions): In this simple case the synchronous 
value of the correlation coefficient is given by: 

c 12 = c5 T 

If now we suppose the rates of the sampling processes to be all equal to A, equation ^ 
can be used to calculate the asynchronous value of covariance, while the variance inherits 
linearity from the synchronous case. The result reads (see appendix [B]) : 

* = "( 1+ sl«(^- 1 ))' 

which is plotted in figure [3] (black line) . In this case we have a spurious (induced by the 
sampling) Epps effect, as the original time series did not show any Epps effect. 



3.2. Example (Lagged series): Now we turn to the synchronous process: 

dX\ = Vcdrf +yjl-cdr)l 
dXl = yfcdrft +T + Vl - cdrj 2 
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Figure 2. Equal time correlation coefficient for two lagged processes, both 
for the case of synchronous and asynchronous sampling. The lag parameter 
r and the sampling rate A are set to r = 0, A = 1 (black line), r = 2, A = oo 
(blue), t = 2,A = 1 (red line). 



and consider again sampling rates Ai = A2 = A. Then, using the above properties, one can 
find that (see appendix [B]) : 

2 



Pit 



2XAt 



XAt 



e -A(At+r) !_ e AAt if At<r 



+ 



c 1 



r AA * cosh(Ar) - e" Ar 



T 

At 



if At > r 



and check that Epps effect is enhanced by the effect of the sampling (covariance grows even 
slower than in the synchronous case), so that genuine and spurious effects superimpose as 
shown in figure [2} 

3.3. Example (Different widths): Also in this case the genuine Epps effect is enhanced 
by the asynchronous sampling; indeed in this case both variance and covariance are in- 
fluenced by the sampling and produce a spurious effect. It is possible to calculate the 
coefficient C^ t assuming sampling rates Ai and A2. Its value reads: 



rti2 



A/ + ( A iML (e -A*/6 
2uiv 2 



1) 



A 2 



Ai(Ai + X 2 )uivi 



-Ai At 



1) + A x «+ Aj 
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where the coefficients Ui and Vi are defined in appendix [Bj The variance is given by: 

'A?£Ke~ At/6 -l)-(e- A * A *-l)\ 



C£ t = At + £. 



2t2 



Notice that the sampling induces a singular auto-correlation function for the variance: 
while the synchronous value of c" is regular in the origin, one can check that it becomes 
singular in zero as an effect of the sampling. In particular, using equation ([7]), one finds 
that the asynchronous auto-correlation is given by: 

' § T + —I^ - fe-H/g-e-^l 



T 1 + T 2(A^ 2 -1) 

and it is easy to see that the regular part goes to zero for small values of r, a feature which 
is also present in empirical data. 

This example shows how the Epps effect can be induced both from variance and covariance 
(as in this case neither of those quantities is linear) , and that the sampling may additionally 
give a spurious contribution to the Epps effect (as the functional dependence of C^ t and 
C^ t changes due to the sampling). In figure [3] some typical curves for normalized variance 
and covariance are presented for this model. 

4. Filtering of asynchronous time series 

An interesting application of property ^ concerns data filtering of asynchronous time 
series: as it is possible to quantify how a synchronous time series is influenced by an 
exponential random sampling, it is also possible to discount its damping effect on the 
high frequency region of the cross-correlation spectrum. As the random sampling induces 
a convolution with a known kernel, the reconstruction of the genuine cross correlation 
structure requires a deconvolution. In particular, given a measured asynchronous time 
series X|, the deconvolution procedure can be carried on following these lines: 

(1) Calculate the measured spectrum Sjj of the time series from raw data X\ 

(2) Compute the sampling rate Aj for each process; 

(3) Estimate the genuine spectrum S^Qby inverting Q: 

qij _ qij ( ArAj \ _ qijjfij-1 

\(Aj + ioj)(Xj - ioj) J 

(4) Write cross-correlations c£_ t , using equation ( 1 ) 

This deconvolution procedure, known as inverse filtering, should in principle allow to com- 
pute the genuine signal with infinite accuracy; in practice, dealing with time series of 
finite length and in which time is discretized, some effects have to be taken into account. 
Moreover, while effects of discreteness and finite size are easy to quantify (appendix [C]), a 
more careful treatment of noise is needed: as the inverse deconvolution amplifies the high 



^Notice that the corrected spectrum has the right properties to construct consistent correlation matrices; 
in particular it is Hermitian and it satisfies S 3 J = Sj! . 
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Figure 3. Normalized covariance and variance for two processes displaying 
exponential decay both of cross-correlation and of auto-correlation, where 
the decay constant are respectively and q s , with £ s < £j. Both the case 
of synchronous and asynchronous sampling (of common rate A) are repre- 
sented. On the right, the variance is plotted in the cases A = oo, £ s = 0.3 
(blue line), A = 1, £ s = (black line) and A = 1, £ s = 0.3 (red line). On the 
left, the covariance is represented in the cases A = oo, = 0.4 (thick blue 
line), A = oo, = 0.8 (narrow blue line), A = 1, £j = 0.4 (thick red line), 
A = 1, = 0.8 (narrow red line) and A = 1, = (black line). 



frequency region of the spectrum with a term proportional to uj 2 , the noise that typically 
dominates that region affects crucially the accuracy of the reconstructed signal. A possible 
solution is to set a cutoff to the maximum frequency used to deconvolve the spectrum, 
choosing for example a deconvolution kernel of the kind: 



-1 



\K 



y i2 



|lOT + SNR- iy 



where SNRJJ is the expected signal-to-noise ratio of the genuine signal. This leads to what 
is called a Wiener filter 



5. Empirical analysis on NYSE data 

An empirical analysis has been carried on using tick-by-tick data from the New York 
Stock Exchange (NYSE) collected during the period going from 02.01.2003 to 
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12.31.2003. We studied daily time series (T = 20000 seconds) of the 100 most traded stocks, 
excluding for each day the first 45 and the last 21 minutes of trades, and then averaged over 
a set of M = 248 days to obtain the spectra S % 3 . X l At was computed from the observed 
values of logp\, where the price was defined to be constant between consecutive trades 
(PTE prescription). All series have been normalized to zero mean and unit variance. It 
has been assumed that measured prices are randomly sampled points of an underlying 
synchronous time series as described above. The sampling rates Aj were computed for each 
stock and the waiting time distributions have been taken to be exponential as a first order 
approximation. 

Cross-correlation coefficients have been systematically calculated; as expected raw cross- 
correlation coefficients Cr show a narrow peak near r = corresponding to the market mode 
(figure [4]), justifying a fit with functions of the form: 

(0\ p-j „ p — \T— T a y nC \/£ aynC 

\ a J L r — ^sync^- 

The influence of the asynchronous sampling on these inferred parameters is indeed relevant, 
as the typical sampling times A -1 are of the same order of Csync] the simplest way to take 
into account its effect is to fit using functions of the form: 

(9) 4 J = c asyn e~^~ Tasyn ^^ asyn * K\ j 



where is the kernel appearing in ([5j), which depends on the estimated sampling fre- 
quencies Aj and Xj, and * denotes convolution. 

Auto-correlations have also been computed for all the stocks, and both their qualitative 
and quantitative behavior turn out to be very different from the case of cross-correlations. 
In particular one can see that all auto-correlations are positively divergent in the origin, 
but assume finite values for lags different than 0, as shown in figure [5] Then the simplest 
fit that can be performed is the one with a function of the kind: 

(10) c* = 

which is the superposition of a purely Brownian part with a fast decaying part. As in 
the case of cross-correlations, we can also fit those functions using their asynchronous 
counterpart: 

(11) K = (a asyn - * a ° y ; )Sr 




basyn 



£°fjg^f l p -\r\/Usyn _ p -\i\r\ 



2[(Ai£ asj/n ) 2 — 1] 



■sdsyn J 

as suggested by the examples discussed in the previous sections. The results of the fit of 
auto- and cross-correlation coefficient with the raw and corrected functions defined above 
are summarized in Table [TJ where three kinds of ensembles (AC, T and L) were considered. 

First we discuss the results for the ensemble AC, which contains the 100 most traded as- 
sets of NYSE, and has been used to compute the infinitesimal auto-correlation coefficients. 
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Figure 4. Left: The raw, infinitesimal cross-correlation coefficient c^ E ^ K 
is shown for the pair of assets GE and K as a function of the lag r (black 
points); the asymmetry of this function can be explained assuming genuine 
correlations of the simple form c£ E ^ K = c5 T and convoluting the effect 
of the sampling (red line); the best fit of the form of equation ^ is also 
shown (blue line). Right: Infinitesimal cross-correlation coefficient c* 2 for 
two asynchronously sampled processes (black points); the evolution of the 
underlying time series with constant correlation was simulated. Sampling 
times were taken to coincide with those of the stocks GE and K in the 
data set. The red line shows the theoretical correlation curve obtained for 
the same underlying process with an exponential waiting time distribution 
matching the measured sampling rates. 



The raw functions have a raw width £ sync broadly distributed around a mean value of 20 s, 
as shown in Table [TJ and are typically characterized by a bimodal shape (79% of the em- 
pirical functions are compatible with zero for r = 0) which the raw model cannot account 
for. The inclusion of the sampling effect in the fitting functions improves the descriptive 
power of the model just slightly: on average the chi-square is reduced of about 30%, but 
fluctuations in the ensemble are strong. Indeed, the asynchrony explains naturally the 
bimodal shape of the auto-correlations, and shifts the width of the corrected function ^ asy n 
to a small interval centered around a value of 1 s (figure [6]), providing thus a mechanism 
to explain most of the signal width. A similar result holds for the ratios a S y nc /b sync and 
ciasyn/basyn- while the former follows a broad distribution, the latter is sharply peaked 
around a mean value of ~ 1.5. These results do not qualitatively change if one takes as 
synchronous fitting function the superposition of a delta function with two exponentials. 
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Figure 5. A typical infinitesimal auto-correlation coefficient (in this case 



qCAG^ j g plotted (black points). Its best fit of the form (10) is plotted in 



blue, while the best fit of the form (11) is represented in red. Notice that 



even if the empirical function we plotted is negative and has a bimodal 
shape, a positive diverging contribution in r = should also be taken into 
account. 

Table 1. Results for auto- and cross-correlation coefficients cf fitted 
against the functions defined in section [5] for various ensembles. Ensem- 
ble AC, used to compute auto-correlations, contains the 100 most traded 
assets of the NYSE, while ensembles T and L contain, respectively, the 10 
more traded and the 10 less traded assets of the same market, and have been 
used to compute cross-correlation functions. For each of the parameters we 
write the ensemble average and show in parenthesis its standard deviation. 



Ensemble ^sync ^asyn Tsync Tasyn ~ZT~ 



AC 


21.8 


1.27 






0.30 




(32.3) 


(1.36) 






(0.63) 


T-T 


12.93 


7.69 


0.30 


-0.27 


-0.05 




(1.56) 


(2.07) 


(1.55) 


(1.98) 


(0.14) 


T-L 


21.42 


9.36 


8.62 


2.10 


0.69 




(3.99) 


(4.45) 


(4.62) 


(4.04) 


(0.36) 


L-L 


28.36 


10.85 


-0.73 


-1.66 


0.005 




(4.60) 


(6.13) 


(4.14) 


(4.98) 


(0.08) 
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Figure 6. Histogram of the fitted values of S, sync (black bars) and £ as2/n 
(red bars), both for cross-correlations (left) and auto-correlations (right). 
In the case of cross-correlations we considered a sample consisting of the 10 
more active assets and the 10 less active ones, while for the auto-correlation 
we considered the 100 most active assets. Notice that while the left plot is in 
linear-linear scale, the right one is in log-linear scale: for auto-correlations 
most of the width is induced by the sampling, while for the cross-correlations 
the asynchrony seems to play a less significant role. 



The other ensembles which have been considered are T and L, containing respectively 
the 10 most and less traded assets of the AC ensemble; they have been used to calculate 
the infinitesimal cross-correlation coefficient for all the pairs of the form T-L, T-T and L-L. 
The inferred widths £ S ync are generally spread on a window of 30 s, ranging from 10 seconds 
(T-T ensemble) to 40 seconds (L-L), while the values of T sync often exceed 10 s in the T-L 
case, indicating that a lack of symmetry is present in this ensemble; the direction of the 
asymmetry reveals an influence of the most traded stocks towards the less traded ones. 
For the T-L ensemble, the asynchronous model turns out to provide a better description 
of the data (see e.g. figure [4]), as residuals are significantly reduced, and most of the 
asymmetry is accounted for in the kernel. Additionally, asynchronous sampling explains 
much of the observed width of correlations functions, as shown in figure [6} Still, compared 
to auto-correlations the histogram of estimated widths of cross-correlations is centered at 
values significantly different from zero, of the order of 10 seconds. In the T-T and L-L 
cases the descriptive power of the two models is almost identical (when sampling rates 
are similar, it becomes harder to statistically discriminate functions ^ and Q). Again, 
even if a part of the width ^ sync is explained by sampling, the value of £ a syn is significantly 
different from zero, meaning that other mechanisms contribute to the formation of Epps 
effect. Interestingly, while the raw width varies significantly within the ensembles T-L, 
T-T and L-L, the corrected width £ a syn is compatible for all of them and of the order of 
10 seconds. 
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Figure 7. The raw equal time correlation coefficient p At (narrow red 
line) is shown for the pair GDW and K, together with the same curve 
obtained with filtered data (thick red line). The black line corresponds to 
the correlation coefficient for a simulated process with the same asymptotic 
value of p l £ t , sampling rates and statistics of the other curves, whose cross 
and auto-correlations are oc 5 T ; the dashed line is the filtered version of the 
same curve. 



In order to compensate for the effect of the sampling it is also possible to filter the 
raw signal using the procedure described above; this allows us to evaluate the impact 
of the asynchrony on the measured correlations as a function of the scale At. Figure [7] 
shows the saturation curves of the correlation for a pair of assets using both raw and 
filtered data and compares them with the ones obtained for a pair of simulated Brownian 
motions with the same asymptotic value of correlation. Results obtained for simulated data 
set the maximum efficiency of the filter, which is fixed by the length of the time series; 
empirical data show that the reconstructed curve is well below such bound, indicating 
that other effects do contribute to the formation of the Epps. These features include by 
micro-structural effects, such as finite tick-size [10] [11], and possibly an intrinsic time 
scale related to human reaction |16j . The same features are detected in figure [8j where the 
infinitesimal, raw cross-correlations o? are compared to the filtered ones; the presence of a 
residual Epps effect is indicated here by the finite width of the filtered curve. Additionally, 
most of the asymmetry contained in the raw curve can be successfully removed, as most 
of the lag is induced as an effect of the different sampling rates. 

Within this approach it is also necessary to estimate from empirical data the nature 
of the waiting time distribution, as it usually deviates from the exponential one which is 
assumed. The effect of the deviations must then be evaluated to ensure the consistency 
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of the procedure previously described. We analyzed this issue by simulating a set of 
synchronous time series of known spectrum, and sampling them using points extracted from 
real data; then spectra for those series were systematically checked against the analytical 
predictions obtained for an exponential waiting time distribution. On the right side of 
figure [4] we compare the effect of an exponential sampling with the real one, finding that 
no significant difference is induced by fluctuations of Aj. 



In this paper we investigated the time-dependence of financial correlations and their 
decay at very high frequency (Epps effect), showing that some simple models of stochastic 
process are able to describe this features. We found that in case of exponentially sam- 
pled data the impact of asynchrony on correlations can be analytically controlled, and 
its contribution can be exactly evaluated. We also find that within this framework one 
can successfully describe some features of the empirical correlations observed in the NYSE 
market, namely the heterogeneity of the price change predictability and the presence of a 
causal structure in the cross-correlations. The first feature is detected as a broad distribu- 
tion of widths both in the auto- and cross-correlation functions of the assets, and can be 
explained by taking into account the effect of the sampling. The second one is quantified 
by the lag of cross-correlation functions, and again can be almost completely justified by 
including the sampling effect. Finally, we find that a significant fraction of the Epps effect 
cannot be explained as just due to the effect of asynchrony, indicating that other kind of 
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effects, conjectured in [16] to be related to time scales of human reaction, contribute to the 
observed dynamics of correlations. 

Appendix A. Effect of an exponential random sampling 

We now turn to prove the properties which allow us to analytically account for the effect 
of the random sampling. 

To prove property PI, we consider a multivariate synchronous process X l At , and let the 
asynchronous sampling be induced by a waiting time distribution Pi(t) = Aje _A,;i as de- 
scribed in section [3j We want to show that for such a process, the covariance can be 
computed using the substitution: 



(Aj + iuj){\j — iu) 

where S % J is the spectrum of the synchronous process. This can be seen by directly com- 



puting the covariance, which is by definition: 

rtS 



IJ 

* {dx i t dx j tl ) 

1 



= AfAf y u dt[dt'i J* 'dtidt'i (dxidxi)) e Ai(-A*+*i+4) ^(-At+t'i+t^ 

where we have used the symmetry with respect to time inversion of the exponential waiting 
time distribution. The expression in parenthesis can be written in Fourier space as: 

/ / (dXidX J H ) = — / dt / dt' / dwS^e'^-^ 

Jt\ Jt'i 27r Jt\ Jt'i J 

And the two time integrals can be performed, leading to: 

Now one can integrate over the waiting time measure, getting as a final expression: 

^At ~ 27T J du ^ ^ + . w)(A , _ [e 1) [e 1) 

which is identical to equation ([I]) obtained in the synchronous case, except for the substi- 
tution 

S l 3 — > Sjj , ■ \,[ — r-T . In this last step the presence of an exponential sampling is crucial 

^ ~\~ l>Lu J y ijijJ J 

to obtain a convolution as the result of the computation, as the dependence of the above 
integrand from At requires a cancellation; in particular one can see that the exponential 
waiting time distribution is the only one producing a convolution as the result of this last 
integration. It is also important to remark that independence between the sampling process 
and the underlying time series has been implicitly assumed in all our construction. 
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Also P2 can be proved by directly calculating the variance. In particular, if given a 
synchronous process of spectrum S u one builds an asynchronous process of sampling rate 
A, its variance is given by: 

"+0O 



Cm 



E 
1 

2^ 



1 

2^ 



+oo c 
, fa, 

aw — = 



oo /-At 



J 



d,T\dr2 e 



2 _ g — tw(At— 72+Tl) _ e iuj(At-T 2 +T 1 ) 



which results: 



Cm 





r+oo 


2tt, 


L ° 


2 






It L 



hoo 



jAt 



1) (e ia;At - 1) + 



-iuiM 



-XAt 



Finally, if variances in the synchronous case are linear (i.e. ((Xm) 2 ) = <7 2 At) or equiv- 
alently if is constant, then in the asynchronous one they are not modified, as one can 
see computing the correcting term in equation ([6]), which in this case is vanishing. 



Appendix B. Calculation of variance and covariance 

Given a synchronous process X\, we are interested in calculating the quantities C^ t 
and C% t defined as in equation |l| and ^ in some representative cases. Indeed we will 
write the expression for the asynchronous covariance only, considering exponential waiting 
time processes of rates Aj, as the corresponding expressions for the synchronous case can 
be obtained taking the limit Aj — > oo in the resulting formulas. Let us consider for the 
synchronous process a spectrum of the kind: 



Sti 



1 + u; 2 ^ 

where we assume r > and £ > 0, consistently with the assumption of correlations of 
the kind c t J _ t , = ie~'* - *' -r '^, in which a lag and an exponential decay are superimposed. 
Then one can calculate using equation ([I]) and substitution Q: 

1 f + °° doj 

2 



2tt 



-iu>M 



1) 



AuiAt 



1 



(1 + w 2 e 2 )(l + zw/Ai)(l - iu/Xj) 



Above integral can be solved by integration on the complex plane after choosing an appro- 
priate contour. In particular the integral can be written as: 

n+OO 



1 

2^r 
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2-e 



iuiAt 



,—iujAt 



qlj 



uj- 



qij 



Then the full integral can be splitted in two (diverging) parts, whose value can be calculated 
by residues. In particular, while the integral of can be always found by choosing a 
semicircular contour closed on the upper imaginary plane, to integrate Bj] it is necessary 
to close the contour according to the sign of At — r. Then the result splits into: 

&l t = Res^Au + ResOiAj - Res^S^ 

- Bes-i^Bu + Res (A, - B u )/2 

for At > r, and: 

C2 t = Res^A; + Resj Al ; A, + ReSi/^Bu 

+ Res iAi B w +Res (Aj + -B w )/2 

for At < r, where Res 2o / 2 denotes the residue of f z in zq. For At > r this reads: 



At - r + A" 1 - AT 1 + XiXj^ 3 



-(At-r)/£ 

2uiVj 



VjU 



+ 



-(At+T)/€ 

2v,iUj 



+ 



+ 



Aj(Ai + Xj)uiVi 
while for At < r it is: 



(2 - e" 



-A,At\ 



Aj(Aj + Xj)ujVj 



u At 



cosh(At/£) - 1) + 



2A ie - A ' T 
Ai(Aj + Xj)uiVi 



[1 - cosh(AiAt)) , 



where U{ = 1 + Aj£ and i>j = — 1 + Aj^. Formulas given in the examples of section [2] and [3] 
can be recovered from this expression by taking the appropriate limits. 



Appendix C. Finite size effects 

The construction described in sections [2] and [3] can be generalized to the case of finite 
size time series in discrete time (t = 1 , . . . , T) : after having defined the discrete Fourier 
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transform of the series dX\ as: 

T-1 

dX l n = dX t e 2 ™' /T 

dxi = fj2 dX >' 2mnt/T 

n=0 

and the spectrum as: 

aij _ (dX^dXj) 

it is possible to consider an asynchronous sampling defined through a rate Aj, so that the 
probability of sampling the time series at time t is uniform and equal to 1 — e _A,; . In this 
case the sampling induces an analogous effect, and substitution Q becomes: 

aij _ aij ( 1 ~ e A ( 1 ~ 15 3 \ 

n n \ I _ e -Ai+2nin/T I \ \ — e -Aj-2irin/T I 

The filtration procedure described in section [4] can still be applied in this case, where it is 
affected by a finite error: correlations in real time have a minimum resolution which scales 
as T -1 / 2 , as noise effects on the measured spectrum fix the accuracy of the reconstructed 
signal. 
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